Cross-validation in penalized generalized linear models: Cross-validated penalized regression

Description

Cross-validating generalized linear models with L1 (lasso) and/or L2 (ridge) penalties, using likelihood cross-validation.

Usage

cvl (response, penalized, unpenalized, lambda1 = 0, lambda2 = 0,
    positive = FALSE, data,
    model = c("cox", "logistic", "linear", "poisson"), startbeta,
    startgamma, fold, epsilon = 1e-10, maxiter, standardize = FALSE,
    trace = TRUE, approximate = FALSE)

optL1 (response, penalized, unpenalized, minlambda1, maxlambda1,
    base1, lambda2 = 0, positive = FALSE, data,
    model = c("cox", "logistic", "linear", "poisson"), startbeta,
    startgamma, fold, epsilon = 1e-10, maxiter = Inf,
    standardize = FALSE, tol = .Machine$double.eps^0.25, trace = TRUE)

optL2 (response, penalized, unpenalized, lambda1 = 0, minlambda2,
    maxlambda2, base2, positive = FALSE, data,
    model = c("cox", "logistic", "linear", "poisson"), startbeta,
    startgamma, fold, epsilon = 1e-10, maxiter, standardize = FALSE,
    tol = .Machine$double.eps^0.25, trace = TRUE, approximate = FALSE)
    
profL1 (response, penalized, unpenalized, minlambda1, maxlambda1,
    base1, lambda2 = 0, positive = FALSE, data,
    model = c("cox", "logistic", "linear", "poisson"), startbeta,
    startgamma, fold, epsilon = 1e-10, maxiter = Inf,
    standardize = FALSE, steps = 100, minsteps = steps/2, log = FALSE,
    save.predictions = FALSE, trace = TRUE, plot = FALSE)

profL2 (response, penalized, unpenalized, lambda1 = 0, minlambda2,
    maxlambda2, base2, positive = FALSE, data,
    model = c("cox", "logistic", "linear", "poisson"), startbeta,
    startgamma, fold, epsilon = 1e-10, maxiter, standardize = FALSE,
    steps = 100, minsteps = steps/2, log = TRUE, save.predictions = FALSE,
    trace = TRUE, plot = FALSE, approximate = FALSE)

Arguments

response

The response variable (vector). This should be a numeric vector for linear regression, a Surv object for Cox regression and factor or a vector of 0/1 values for logistic regression.

penalized

The penalized covariates. These may be specified either as a matrix or as a (one-sided) formula object. See also under data.

unpenalized

Additional unpenalized covariates. Specified as under penalized. Note that an unpenalized intercept is included in the model by default (except in the cox model). This can be suppressed by specifying unpenalized = ~0.

lambda1, lambda2

The fixed values of the tuning parameters for L1 and L2 penalization. Each must be either a single positive numbers or a vector with length equal to the number of covariates in penalized argument. In the latter case, each covariate is given i

minlambda1, minlambda2, maxlambda1, maxlambda2

The values of the tuning parameters for L1 or L2 penalization between which the cross-validated likelihood is to be profiled or optimized.

base1, base2

An optional vector of length equal to the number of covariates in penalized. If supplied, profiling or optimization is performed between base1*minlambda1 and base1*maxlambda1; analogous for base2.

positive

If TRUE, constrains the estimated regression coefficients of all penalized covariates to be non-negative. If a logical vector with the length of the number of covariates in penalized, constrains the estimated regression coefficie

data

A data.frame used to evaluate response, and the terms of penalized or unpenalized when these have been specified as a formula object.

model

The model to be used. If missing, the model will be guessed from the response input.

startbeta

Starting values for the regression coefficients of the penalized covariates. These starting values will be used only for the first values of lambda1 and lambda2.

startgamma

Starting values for the regression coefficients of the unpenalized covariates. These starting values will be used only for the first values of lambda1 and lambda2.

fold

The fold for cross-validation. May be supplied as a single number (between 2 and n) giving the number of folds, or, alternatively, as a length n vector with values in 1:fold, specifying exactly which subjects are assigned to whic

epsilon

The convergence criterion. As in glm. Convergence is judged separately on the likelihood and on the penalty.

maxiter

The maximum number of iterations allowed. Set by default at 25 when only an L2 penalty is present, infinite otherwise.

standardize

If TRUE, standardizes all penalized covariates to unit central L2-norm before applying penalization.

steps

The maximum number of steps between minlambda1 and maxlambda1 or minlambda2 and maxlambda2 at which the cross-validated likelihood is to be calculated.

minsteps

The minimum number of steps between minlambda1 and maxlambda1 or minlambda2 and maxlambda2 at which the cross-validated likelihood is to be calculated. If minsteps is smaller than step

log

If FALSE, the steps between minlambda1 and maxlambda1 or minlambda2 and maxlambda2 are equidistant on a linear scale, if TRUE on a logarithmic scale. Please note the different d

tol

The tolerance of the Brent algorithm used for minimization. See also optimize.

save.predictions

Controls whether or not to save cross-validated predictions for all values of lambda.

trace

If TRUE, prints progress information. Note that setting trace=TRUE may slow down the algorithm (but it often feels quicker)

approximate

If TRUE, the cross-validated likelihood values are approximated rather than fully calculated. Note that this option is only available for ridge models.

plot

If TRUE, makes a plot of cross-validated likelihood versus lambda.

Value

A named list. See details.

Details

All five functions return a list with the following named elements: [object Object],[object Object],[object Object],[object Object],[object Object]

References

Goeman J.J. (2010). L-1 Penalized Estimation in the Cox Proportional Hazards Model. Biometrical Journal 52 (1) 70-84.

Examples

Run this code

# More examples in the package vignette:
#  type vignette("penalized")

data(nki70)
attach(nki70)

# Finding an optimal cross-validated likelihood
opt <- optL1(Surv(time, event), penalized = nki70[,8:77], fold = 10)
coefficients(opt$fullfit)
plot(opt$predictions)

# Plotting the profile of the cross-validated likelihood
prof <- profL1(Surv(time, event), penalized = nki70[,8:77],
    fold = opt$fold, steps=20)
plot(prof$lambda, prof$cvl, type="l")
plotpath(prof$fullfit)

Run the code above in your browser using DataLab